Filter columns and rows¶
The most recent data in the LendingClub dataset is from 2018, and since then, LendingClub has stopped operating as a peer-to-peer lender. Unsurprisingly, it's difficult to find explanations on the LendingClub website about the features in this dataset.
Sites not officially associated with LendingClub still contain information about the peer-to-peer service previously offered by LendingClub. As a result, the feature exploration for this project includes links to miscellaneous pages such as blogs.
Beginning with the current notebook, however, lack of detailed information about
features does impose some limits. For instance, rows containing certain values of
loan_status are filtered out simply because it is difficult to understand what those
values mean.
This notebook does the following:
- Explore features to determine what filtering should be done.
- Filter out certain columns and rows from the data on accepted loans.
The information available on rejected loans is fairly limited, so for now, I will limit attention to the data on accepted loans.
The data-cleaning steps developed in this notebook have been incorporated into the
project's Python package notebook_tools.
import numpy as np
import pandas as pd
import plotly.express as px
from IPython.display import display
from notebook_tools.data_cleaning import (
convert_acc_loan_data,
convert_rej_loan_data,
load_acc_loan_data,
load_acc_loan_feat_desc,
load_rej_loan_data,
)
from notebook_tools.feature_exploration import (
get_group_sizes,
get_value_counts,
style_loan_summary,
style_value_counts,
summarize_acc_loans,
summarize_loan_data,
)
Display feature summaries¶
Use functions in the package notebook_tools to load data and generate feature
summaries.
acc_loan_data = load_acc_loan_data(excluded_cols=["member_id"]).pipe(
convert_acc_loan_data
)
acc_loan_feat_desc = load_acc_loan_feat_desc()
rej_loan_data = load_rej_loan_data().pipe(convert_rej_loan_data)
Feature summaries for accepted loans¶
Total number of records: 2,260,701¶
print(f"The number of records for accepted loans is {len(acc_loan_data.index):,d}.")
The number of records for accepted loans is 2,260,701.
for dtype in [np.number, "string", "boolean"]:
summary = summarize_acc_loans(acc_loan_data, dtype, acc_loan_feat_desc)
print(f"\n\nThe number of features of type {dtype} is {len(summary.index)}.\n\n")
display(style_loan_summary(summary))
The number of features of type <class 'numpy.number'> is 112.
| count | mean | std | min | 25% | 50% | 75% | max | data type | description | |
|---|---|---|---|---|---|---|---|---|---|---|
| loan_amnt | 2,260,668 | 15,046.9 | 9,190.2 | 500.0 | 8,000.0 | 12,900.0 | 20,000.0 | 40,000.0 | Float64 | The listed amount of the loan applied for by the borrower. If at some point in time, the credit department reduces the loan amount, then it will be reflected in this value. |
| funded_amnt | 2,260,668 | 15,041.7 | 9,188.4 | 500.0 | 8,000.0 | 12,875.0 | 20,000.0 | 40,000.0 | Float64 | The total amount committed to that loan at that point in time. |
| funded_amnt_inv | 2,260,668 | 15,023.4 | 9,192.3 | 0.0 | 8,000.0 | 12,800.0 | 20,000.0 | 40,000.0 | Float64 | The total amount committed by investors for that loan at that point in time. |
| term | 2,260,668 | 42.9 | 10.9 | 36.0 | 36.0 | 36.0 | 60.0 | 60.0 | Int64 | The number of payments on the loan. Values are in months and can be either 36 or 60. |
| int_rate | 2,260,668 | 13.1 | 4.8 | 5.3 | 9.5 | 12.6 | 16.0 | 31.0 | Float64 | Interest Rate on the loan |
| installment | 2,260,668 | 445.8 | 267.2 | 4.9 | 251.7 | 378.0 | 593.3 | 1,719.8 | Float64 | The monthly payment owed by the borrower if the loan originates. |
| annual_inc | 2,260,664 | 77,992.4 | 112,696.2 | 0.0 | 46,000.0 | 65,000.0 | 93,000.0 | 110,000,000.0 | Float64 | The self-reported annual income provided by the borrower during registration. |
| dti | 2,258,957 | 18.8 | 14.2 | -1.0 | 11.9 | 17.8 | 24.5 | 999.0 | Float64 | A ratio calculated using the borrower’s total monthly debt payments on the total debt obligations, excluding mortgage and the requested LC loan, divided by the borrower’s self-reported monthly income. |
| delinq_2yrs | 2,260,639 | 0.3 | 0.9 | 0.0 | 0.0 | 0.0 | 0.0 | 58.0 | Int64 | The number of 30+ days past-due incidences of delinquency in the borrower's credit file for the past 2 years |
| fico_range_low | 2,260,668 | 698.6 | 33.0 | 610.0 | 675.0 | 690.0 | 715.0 | 845.0 | Int64 | The lower boundary range the borrower’s FICO at loan origination belongs to. |
| fico_range_high | 2,260,668 | 702.6 | 33.0 | 614.0 | 679.0 | 694.0 | 719.0 | 850.0 | Int64 | The upper boundary range the borrower’s FICO at loan origination belongs to. |
| inq_last_6mths | 2,260,638 | 0.6 | 0.9 | 0.0 | 0.0 | 0.0 | 1.0 | 33.0 | Int64 | The number of inquiries in past 6 months (excluding auto and mortgage inquiries) |
| mths_since_last_delinq | 1,102,166 | 34.5 | 21.9 | 0.0 | 16.0 | 31.0 | 50.0 | 226.0 | Int64 | The number of months since the borrower's last delinquency. |
| mths_since_last_record | 359,156 | 72.3 | 26.5 | 0.0 | 55.0 | 74.0 | 92.0 | 129.0 | Int64 | The number of months since the last public record. |
| open_acc | 2,260,639 | 11.6 | 5.6 | 0.0 | 8.0 | 11.0 | 14.0 | 101.0 | Int64 | The number of open credit lines in the borrower's credit file. |
| pub_rec | 2,260,639 | 0.2 | 0.6 | 0.0 | 0.0 | 0.0 | 0.0 | 86.0 | Int64 | Number of derogatory public records |
| revol_bal | 2,260,668 | 16,658.5 | 22,948.3 | 0.0 | 5,950.0 | 11,324.0 | 20,246.0 | 2,904,836.0 | Float64 | Total credit revolving balance |
| revol_util | 2,258,866 | 50.3 | 24.7 | 0.0 | 31.5 | 50.3 | 69.4 | 892.3 | Float64 | Revolving line utilization rate, or the amount of credit the borrower is using relative to all available revolving credit. |
| total_acc | 2,260,639 | 24.2 | 12.0 | 1.0 | 15.0 | 22.0 | 31.0 | 176.0 | Int64 | The total number of credit lines currently in the borrower's credit file |
| out_prncp | 2,260,668 | 4,206.9 | 7,343.2 | 0.0 | 0.0 | 0.0 | 6,149.9 | 40,000.0 | Float64 | Remaining outstanding principal for total amount funded |
| out_prncp_inv | 2,260,668 | 4,206.0 | 7,342.3 | 0.0 | 0.0 | 0.0 | 6,146.3 | 40,000.0 | Float64 | Remaining outstanding principal for portion of total amount funded by investors |
| total_pymnt | 2,260,668 | 12,082.6 | 9,901.4 | 0.0 | 4,546.5 | 9,329.7 | 16,940.9 | 63,296.9 | Float64 | Payments received to date for total amount funded |
| total_pymnt_inv | 2,260,668 | 12,064.4 | 9,897.0 | 0.0 | 4,531.8 | 9,309.7 | 16,916.7 | 63,296.9 | Float64 | Payments received to date for portion of total amount funded by investors |
| total_rec_prncp | 2,260,668 | 9,505.8 | 8,321.9 | 0.0 | 3,000.0 | 7,000.0 | 13,899.1 | 40,000.0 | Float64 | Principal received to date |
| total_rec_int | 2,260,668 | 2,431.4 | 2,679.7 | 0.0 | 728.2 | 1,525.9 | 3,108.1 | 28,192.5 | Float64 | Interest received to date |
| total_rec_late_fee | 2,260,668 | 1.5 | 11.8 | -0.0 | 0.0 | 0.0 | 0.0 | 1,484.3 | Float64 | Late fees received to date |
| recoveries | 2,260,668 | 143.9 | 748.2 | 0.0 | 0.0 | 0.0 | 0.0 | 39,859.6 | Float64 | post charge off gross recovery |
| collection_recovery_fee | 2,260,668 | 24.0 | 131.2 | 0.0 | 0.0 | 0.0 | 0.0 | 7,174.7 | Float64 | post charge off collection fee |
| last_pymnt_amnt | 2,260,668 | 3,429.3 | 6,018.2 | 0.0 | 310.3 | 600.6 | 3,743.8 | 42,192.1 | Float64 | Last total payment amount received |
| last_fico_range_high | 2,260,668 | 687.7 | 73.0 | 0.0 | 654.0 | 699.0 | 734.0 | 850.0 | Int64 | The upper boundary range the borrower’s last FICO pulled belongs to. |
| last_fico_range_low | 2,260,668 | 675.5 | 111.1 | 0.0 | 650.0 | 695.0 | 730.0 | 845.0 | Int64 | The lower boundary range the borrower’s last FICO pulled belongs to. |
| collections_12_mths_ex_med | 2,260,523 | 0.0 | 0.2 | 0.0 | 0.0 | 0.0 | 0.0 | 20.0 | Int64 | Number of collections in 12 months excluding medical collections |
| mths_since_last_major_derog | 580,775 | 44.2 | 21.5 | 0.0 | 27.0 | 44.0 | 62.0 | 226.0 | Int64 | Months since most recent 90-day or worse rating |
| annual_inc_joint | 120,710 | 123,624.6 | 74,161.3 | 5,693.5 | 83,400.0 | 110,000.0 | 147,995.0 | 7,874,821.0 | Float64 | The combined self-reported annual income provided by the co-borrowers during registration |
| dti_joint | 120,706 | 19.3 | 7.8 | 0.0 | 13.5 | 18.8 | 24.6 | 69.5 | Float64 | A ratio calculated using the co-borrowers' total monthly payments on the total debt obligations, excluding mortgages and the requested LC loan, divided by the co-borrowers' combined self-reported monthly income |
| acc_now_delinq | 2,260,639 | 0.0 | 0.1 | 0.0 | 0.0 | 0.0 | 0.0 | 14.0 | Int64 | The number of accounts on which the borrower is now delinquent. |
| tot_coll_amt | 2,190,392 | 232.7 | 8,518.5 | 0.0 | 0.0 | 0.0 | 0.0 | 9,152,545.0 | Float64 | Total collection amounts ever owed |
| tot_cur_bal | 2,190,392 | 142,492.2 | 160,692.6 | 0.0 | 29,092.0 | 79,240.0 | 213,204.0 | 9,971,659.0 | Float64 | Total current balance of all accounts |
| open_acc_6m | 1,394,538 | 0.9 | 1.1 | 0.0 | 0.0 | 1.0 | 1.0 | 18.0 | Int64 | Number of open trades in last 6 months |
| open_act_il | 1,394,539 | 2.8 | 3.0 | 0.0 | 1.0 | 2.0 | 3.0 | 57.0 | Int64 | Number of currently active installment trades |
| open_il_12m | 1,394,539 | 0.7 | 0.9 | 0.0 | 0.0 | 0.0 | 1.0 | 25.0 | Int64 | Number of installment accounts opened in past 12 months |
| open_il_24m | 1,394,539 | 1.6 | 1.6 | 0.0 | 0.0 | 1.0 | 2.0 | 51.0 | Int64 | Number of installment accounts opened in past 24 months |
| mths_since_rcnt_il | 1,350,744 | 21.2 | 26.0 | 0.0 | 7.0 | 13.0 | 24.0 | 511.0 | Int64 | Months since most recent installment accounts opened |
| total_bal_il | 1,394,539 | 35,506.6 | 44,097.5 | 0.0 | 8,695.0 | 23,127.0 | 46,095.0 | 1,837,038.0 | Float64 | Total current balance of all installment accounts |
| il_util | 1,191,818 | 69.1 | 23.7 | 0.0 | 55.0 | 72.0 | 86.0 | 1,000.0 | Float64 | Ratio of total current balance to high credit/credit limit on all install acct |
| open_rv_12m | 1,394,539 | 1.3 | 1.5 | 0.0 | 0.0 | 1.0 | 2.0 | 28.0 | Int64 | Number of revolving trades opened in past 12 months |
| open_rv_24m | 1,394,539 | 2.7 | 2.6 | 0.0 | 1.0 | 2.0 | 4.0 | 60.0 | Int64 | Number of revolving trades opened in past 24 months |
| max_bal_bc | 1,394,539 | 5,806.4 | 5,690.6 | 0.0 | 2,284.0 | 4,413.0 | 7,598.0 | 1,170,668.0 | Float64 | Maximum current balance owed on all revolving accounts |
| all_util | 1,394,320 | 57.0 | 20.9 | 0.0 | 43.0 | 58.0 | 72.0 | 239.0 | Float64 | Balance to credit limit on all trades |
| total_rev_hi_lim | 2,190,392 | 34,573.9 | 36,728.5 | 0.0 | 14,700.0 | 25,400.0 | 43,200.0 | 9,999,999.0 | Float64 | Total revolving high credit/credit limit |
| inq_fi | 1,394,539 | 1.0 | 1.5 | 0.0 | 0.0 | 1.0 | 1.0 | 48.0 | Int64 | Number of personal finance inquiries |
| total_cu_tl | 1,394,538 | 1.5 | 2.7 | 0.0 | 0.0 | 0.0 | 2.0 | 111.0 | Int64 | Number of finance trades |
| inq_last_12m | 1,394,538 | 2.0 | 2.4 | 0.0 | 0.0 | 1.0 | 3.0 | 67.0 | Int64 | Number of credit inquiries in past 12 months |
| acc_open_past_24mths | 2,210,638 | 4.5 | 3.2 | 0.0 | 2.0 | 4.0 | 6.0 | 64.0 | Int64 | Number of trades opened in past 24 months. |
| avg_cur_bal | 2,190,322 | 13,547.8 | 16,474.1 | 0.0 | 3,080.0 | 7,335.0 | 18,783.0 | 958,084.0 | Float64 | Average current balance of all accounts |
| bc_open_to_buy | 2,185,733 | 11,394.3 | 16,599.5 | 0.0 | 1,722.0 | 5,442.0 | 14,187.0 | 711,140.0 | Float64 | Total open to buy on revolving bankcards. |
| bc_util | 2,184,597 | 57.9 | 28.6 | 0.0 | 35.4 | 60.2 | 83.1 | 339.6 | Float64 | Ratio of total current balance to high credit/credit limit for all bankcard accounts. |
| chargeoff_within_12_mths | 2,260,523 | 0.0 | 0.1 | 0.0 | 0.0 | 0.0 | 0.0 | 10.0 | Int64 | Number of charge-offs within 12 months |
| delinq_amnt | 2,260,639 | 12.4 | 726.5 | 0.0 | 0.0 | 0.0 | 0.0 | 249,925.0 | Float64 | The past-due amount owed for the accounts on which the borrower is now delinquent. |
| mo_sin_old_il_acct | 2,121,597 | 125.7 | 53.4 | 0.0 | 96.0 | 130.0 | 154.0 | 999.0 | Int64 | Months since oldest bank installment account opened |
| mo_sin_old_rev_tl_op | 2,190,391 | 181.5 | 97.1 | 1.0 | 116.0 | 164.0 | 232.0 | 999.0 | Int64 | Months since oldest revolving account opened |
| mo_sin_rcnt_rev_tl_op | 2,190,391 | 14.0 | 17.5 | 0.0 | 4.0 | 8.0 | 17.0 | 547.0 | Int64 | Months since most recent revolving account opened |
| mo_sin_rcnt_tl | 2,190,392 | 8.3 | 9.2 | 0.0 | 3.0 | 6.0 | 11.0 | 382.0 | Int64 | Months since most recent account opened |
| mort_acc | 2,210,638 | 1.6 | 1.9 | 0.0 | 0.0 | 1.0 | 3.0 | 94.0 | Int64 | Number of mortgage accounts. |
| mths_since_recent_bc | 2,187,256 | 24.8 | 32.3 | 0.0 | 6.0 | 14.0 | 30.0 | 661.0 | Int64 | Months since most recent bankcard account opened. |
| mths_since_recent_bc_dlq | 519,701 | 39.3 | 22.6 | 0.0 | 21.0 | 37.0 | 57.0 | 202.0 | Int64 | Months since most recent bankcard delinquency |
| mths_since_recent_inq | 1,965,233 | 7.0 | 6.0 | 0.0 | 2.0 | 5.0 | 11.0 | 25.0 | Int64 | Months since most recent inquiry. |
| mths_since_recent_revol_delinq | 740,359 | 35.8 | 22.3 | 0.0 | 17.0 | 33.0 | 51.0 | 202.0 | Int64 | Months since most recent revolving delinquency. |
| num_accts_ever_120_pd | 2,190,392 | 0.5 | 1.4 | 0.0 | 0.0 | 0.0 | 0.0 | 58.0 | Int64 | Number of accounts ever 120 or more days past due |
| num_actv_bc_tl | 2,190,392 | 3.7 | 2.3 | 0.0 | 2.0 | 3.0 | 5.0 | 50.0 | Int64 | Number of currently active bankcard accounts |
| num_actv_rev_tl | 2,190,392 | 5.6 | 3.4 | 0.0 | 3.0 | 5.0 | 7.0 | 72.0 | Int64 | Number of currently active revolving trades |
| num_bc_sats | 2,202,078 | 4.8 | 3.0 | 0.0 | 3.0 | 4.0 | 6.0 | 71.0 | Int64 | Number of satisfactory bankcard accounts |
| num_bc_tl | 2,190,392 | 7.7 | 4.7 | 0.0 | 4.0 | 7.0 | 10.0 | 86.0 | Int64 | Number of bankcard accounts |
| num_il_tl | 2,190,392 | 8.4 | 7.4 | 0.0 | 3.0 | 6.0 | 11.0 | 159.0 | Int64 | Number of installment accounts |
| num_op_rev_tl | 2,190,392 | 8.2 | 4.7 | 0.0 | 5.0 | 7.0 | 10.0 | 91.0 | Int64 | Number of open revolving accounts |
| num_rev_accts | 2,190,391 | 14.0 | 8.0 | 0.0 | 8.0 | 12.0 | 18.0 | 151.0 | Int64 | Number of revolving accounts |
| num_rev_tl_bal_gt_0 | 2,190,392 | 5.6 | 3.3 | 0.0 | 3.0 | 5.0 | 7.0 | 65.0 | Int64 | Number of revolving trades with balance >0 |
| num_sats | 2,202,078 | 11.6 | 5.6 | 0.0 | 8.0 | 11.0 | 14.0 | 101.0 | Int64 | Number of satisfactory accounts |
| num_tl_120dpd_2m | 2,107,011 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 7.0 | Int64 | Number of accounts currently 120 days past due (updated in past 2 months) |
| num_tl_30dpd | 2,190,392 | 0.0 | 0.1 | 0.0 | 0.0 | 0.0 | 0.0 | 4.0 | Int64 | Number of accounts currently 30 days past due (updated in past 2 months) |
| num_tl_90g_dpd_24m | 2,190,392 | 0.1 | 0.5 | 0.0 | 0.0 | 0.0 | 0.0 | 58.0 | Int64 | Number of accounts 90 or more days past due in last 24 months |
| num_tl_op_past_12m | 2,190,392 | 2.1 | 1.8 | 0.0 | 1.0 | 2.0 | 3.0 | 32.0 | Int64 | Number of accounts opened in past 12 months |
| pct_tl_nvr_dlq | 2,190,237 | 94.1 | 9.0 | 0.0 | 91.3 | 100.0 | 100.0 | 100.0 | Float64 | Percent of trades never delinquent |
| percent_bc_gt_75 | 2,185,289 | 42.4 | 36.2 | 0.0 | 0.0 | 37.5 | 71.4 | 100.0 | Float64 | Percentage of all bankcard accounts > 75% of limit. |
| pub_rec_bankruptcies | 2,259,303 | 0.1 | 0.4 | 0.0 | 0.0 | 0.0 | 0.0 | 12.0 | Int64 | Number of public record bankruptcies |
| tax_liens | 2,260,563 | 0.0 | 0.4 | 0.0 | 0.0 | 0.0 | 0.0 | 85.0 | Int64 | Number of tax liens |
| tot_hi_cred_lim | 2,190,392 | 178,242.8 | 181,574.8 | 0.0 | 50,731.0 | 114,298.5 | 257,755.0 | 9,999,999.0 | Float64 | Total high credit/credit limit |
| total_bal_ex_mort | 2,210,638 | 51,022.9 | 49,911.2 | 0.0 | 20,892.0 | 37,864.0 | 64,350.0 | 3,408,095.0 | Float64 | Total credit balance excluding mortgage |
| total_bc_limit | 2,210,638 | 23,193.8 | 23,006.6 | 0.0 | 8,300.0 | 16,300.0 | 30,300.0 | 1,569,000.0 | Float64 | Total bankcard high credit/credit limit |
| total_il_high_credit_limit | 2,190,392 | 43,732.0 | 45,073.0 | 0.0 | 15,000.0 | 32,696.0 | 58,804.2 | 2,118,996.0 | Float64 | Total installment high credit/credit limit |
| revol_bal_joint | 108,020 | 33,617.3 | 28,153.9 | 0.0 | 15,106.8 | 26,516.5 | 43,769.0 | 1,110,019.0 | Float64 | Sum of revolving credit balance of the co-borrowers, net of duplicate balances |
| sec_app_fico_range_low | 108,021 | 669.8 | 44.7 | 540.0 | 645.0 | 670.0 | 695.0 | 845.0 | Int64 | FICO range (high) for the secondary applicant |
| sec_app_fico_range_high | 108,021 | 673.8 | 44.7 | 544.0 | 649.0 | 674.0 | 699.0 | 850.0 | Int64 | FICO range (low) for the secondary applicant |
| sec_app_inq_last_6mths | 108,021 | 0.6 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 6.0 | Int64 | Credit inquiries in the last 6 months at time of application for the secondary applicant |
| sec_app_mort_acc | 108,021 | 1.5 | 1.8 | 0.0 | 0.0 | 1.0 | 2.0 | 27.0 | Int64 | Number of mortgage accounts at time of application for the secondary applicant |
| sec_app_open_acc | 108,021 | 11.5 | 6.6 | 0.0 | 7.0 | 10.0 | 15.0 | 82.0 | Int64 | Number of open trades at time of application for the secondary applicant |
| sec_app_revol_util | 106,184 | 58.2 | 25.5 | 0.0 | 39.8 | 60.2 | 78.6 | 434.3 | Float64 | Ratio of total current balance to high credit/credit limit for all revolving accounts |
| sec_app_open_act_il | 108,021 | 3.0 | 3.3 | 0.0 | 1.0 | 2.0 | 4.0 | 43.0 | Int64 | Number of currently active installment trades at time of application for the secondary applicant |
| sec_app_num_rev_accts | 108,021 | 12.5 | 8.2 | 0.0 | 7.0 | 11.0 | 17.0 | 106.0 | Int64 | Number of revolving accounts at time of application for the secondary applicant |
| sec_app_chargeoff_within_12_mths | 108,021 | 0.0 | 0.4 | 0.0 | 0.0 | 0.0 | 0.0 | 21.0 | Int64 | Number of charge-offs within last 12 months at time of application for the secondary applicant |
| sec_app_collections_12_mths_ex_med | 108,021 | 0.1 | 0.4 | 0.0 | 0.0 | 0.0 | 0.0 | 23.0 | Int64 | Number of collections within last 12 months excluding medical collections at time of application for the secondary applicant |
| sec_app_mths_since_last_major_derog | 35,942 | 36.9 | 23.9 | 0.0 | 16.0 | 36.0 | 56.0 | 185.0 | Int64 | Months since most recent 90-day or worse rating at time of application for the secondary applicant |
| deferral_term | 10,917 | 3.0 | 0.0 | 3.0 | 3.0 | 3.0 | 3.0 | 3.0 | Int64 | Amount of months that the borrower is expected to pay less than the contractual monthly payment amount due to a hardship plan |
| hardship_amount | 10,917 | 155.0 | 129.0 | 0.6 | 59.4 | 119.1 | 213.3 | 943.9 | Float64 | The interest payment that the borrower has committed to make each month while they are on a hardship plan |
| hardship_length | 10,917 | 3.0 | 0.0 | 3.0 | 3.0 | 3.0 | 3.0 | 3.0 | Int64 | The number of months the borrower will make smaller payments than normally obligated due to a hardship plan |
| hardship_dpd | 10,917 | 13.7 | 9.7 | 0.0 | 5.0 | 15.0 | 22.0 | 37.0 | Int64 | Account days past due as of the hardship plan start date |
| orig_projected_additional_accrued_interest | 8,651 | 454.8 | 375.4 | 1.9 | 175.2 | 352.8 | 620.2 | 2,680.9 | Float64 | The original projected additional interest amount that will accrue for the given hardship payment plan as of the Hardship Start Date. This field will be null if the borrower has broken their hardship payment plan. |
| hardship_payoff_balance_amount | 10,917 | 11,636.9 | 7,626.0 | 55.7 | 5,627.0 | 10,028.4 | 16,151.9 | 40,306.4 | Float64 | The payoff balance amount as of the hardship plan start date |
| hardship_last_payment_amount | 10,917 | 194.0 | 198.6 | 0.0 | 44.4 | 133.2 | 284.2 | 1,407.9 | Float64 | The last payment amount as of the hardship plan start date |
| settlement_amount | 34,246 | 5,010.7 | 3,693.1 | 44.2 | 2,208.0 | 4,146.1 | 6,850.2 | 33,601.0 | Float64 | The loan amount that the borrower has agreed to settle for |
| settlement_percentage | 34,246 | 47.8 | 7.3 | 0.2 | 45.0 | 45.0 | 50.0 | 521.4 | Float64 | The settlement amount as a percentage of the payoff balance amount on the loan |
| settlement_term | 34,246 | 13.2 | 8.2 | 0.0 | 6.0 | 14.0 | 18.0 | 181.0 | Int64 | The number of months that the borrower will be on the settlement plan |
The number of features of type string is 35.
| count | unique | top | freq | data type | description | |
|---|---|---|---|---|---|---|
| id | 2,260,701 | 2,260,701 | 68407277 | 1 | string | A unique LC assigned ID for the loan listing. |
| grade | 2,260,668 | 7 | B | 663,557 | string | LC assigned loan grade |
| sub_grade | 2,260,668 | 35 | C1 | 145,903 | string | LC assigned loan subgrade |
| emp_title | 2,093,699 | 512,694 | Teacher | 38,824 | string | The job title supplied by the Borrower when applying for the loan.* |
| emp_length | 2,113,761 | 11 | 10+ years | 748,005 | string | Employment length in years. Possible values are between 0 and 10 where 0 means less than one year and 10 means ten or more years. |
| home_ownership | 2,260,668 | 6 | MORTGAGE | 1,111,450 | string | The home ownership status provided by the borrower during registration or obtained from the credit report. Our values are: RENT, OWN, MORTGAGE, OTHER |
| verification_status | 2,260,668 | 3 | Source Verified | 886,231 | string | Indicates if income was verified by LC, not verified, or if the income source was verified |
| issue_d | 2,260,668 | 139 | 2016-03 | 61,992 | string | The month which the loan was funded |
| loan_status | 2,260,668 | 9 | Fully Paid | 1,076,751 | string | Current status of the loan |
| url | 2,260,668 | 2,260,668 | https://lendingclub.com/browse/loanDetail.action?loan_id=68407277 | 1 | string | URL for the LC page with listing data. |
| desc | 126,065 | 124,500 | 252 | string | Loan description provided by the borrower | |
| purpose | 2,260,668 | 14 | debt_consolidation | 1,277,877 | string | A category provided by the borrower for the loan request. |
| title | 2,237,342 | 63,154 | Debt consolidation | 1,153,293 | string | The loan title provided by the borrower |
| zip_code | 2,260,667 | 956 | 112xx | 23,908 | string | The first 3 numbers of the zip code provided by the borrower in the loan application. |
| addr_state | 2,260,668 | 51 | CA | 314,533 | string | The state provided by the borrower in the loan application |
| earliest_cr_line | 2,260,639 | 754 | 2004-09 | 15,400 | string | The month the borrower's earliest reported credit line was opened |
| initial_list_status | 2,260,668 | 2 | w | 1,535,467 | string | The initial listing status of the loan. Possible values are – W, F |
| last_pymnt_d | 2,258,241 | 136 | 2019-03 | 853,003 | string | Last month payment was received |
| next_pymnt_d | 915,358 | 106 | 2019-04 | 912,221 | string | Next scheduled payment date |
| last_credit_pull_d | 2,260,596 | 141 | 2019-03 | 1,371,381 | string | The most recent month LC pulled credit for this loan |
| policy_code | 2,260,668 | 1 | 1.0 | 2,260,668 | string | publicly available policy_code=1 new products not publicly available policy_code=2 |
| application_type | 2,260,668 | 2 | Individual | 2,139,958 | string | Indicates whether the loan is an individual application or a joint application with two co-borrowers |
| verification_status_joint | 115,730 | 3 | Not Verified | 57,403 | string | Indicates if the co-borrowers' joint income was verified by LC, not verified, or if the income source was verified |
| sec_app_earliest_cr_line | 108,021 | 663 | 2006-08 | 998 | string | Earliest credit line at time of application for the secondary applicant |
| hardship_type | 10,917 | 1 | INTEREST ONLY-3 MONTHS DEFERRAL | 10,917 | string | Describes the hardship plan offering |
| hardship_reason | 10,917 | 9 | NATURAL_DISASTER | 2,965 | string | Describes the reason the hardship plan was offered |
| hardship_status | 10,917 | 3 | COMPLETED | 7,819 | string | Describes if the hardship plan is active, pending, canceled, completed, or broken |
| hardship_start_date | 10,917 | 27 | 2017-09 | 2,444 | string | The start date of the hardship plan period |
| hardship_end_date | 10,917 | 28 | 2017-12 | 1,756 | string | The end date of the hardship plan period |
| payment_plan_start_date | 10,917 | 27 | 2017-09 | 1,715 | string | The day the first hardship plan payment is due. For example, if a borrower has a hardship plan period of 3 months, the start date is the start of the three-month period in which the borrower is allowed to make interest-only payments. |
| hardship_loan_status | 10,917 | 5 | Late (16-30 days) | 4,770 | string | Loan Status as of the hardship plan start date |
| disbursement_method | 2,260,668 | 2 | Cash | 2,182,546 | string | The method by which the borrower receives their loan. Possible values are: CASH, DIRECT_PAY |
| debt_settlement_flag_date | 34,246 | 83 | 2019-02 | 2,606 | string | The most recent date that the Debt_Settlement_Flag has been set |
| settlement_status | 34,246 | 3 | ACTIVE | 14,704 | string | The status of the borrower’s settlement plan. Possible values are: COMPLETE, ACTIVE, BROKEN, CANCELLED, DENIED, DRAFT |
| settlement_date | 34,246 | 90 | 2019-01 | 1,710 | string | The date that the borrower agrees to the settlement plan |
The number of features of type boolean is 3.
| count | unique | top | freq | data type | description | |
|---|---|---|---|---|---|---|
| pymnt_plan | 2,260,668 | 2 | False | 2,260,048 | boolean | Indicates if a payment plan has been put in place for the loan |
| hardship_flag | 2,260,668 | 2 | False | 2,259,836 | boolean | Flags whether or not the borrower is on a hardship plan |
| debt_settlement_flag | 2,260,668 | 2 | False | 2,226,422 | boolean | Flags whether or not the borrower, who has charged-off, is working with a debt-settlement company. |
Feature summaries for rejected loans¶
Total number of records: 27,648,741¶
print(f"The number of records for accepted loans is {len(rej_loan_data.index):,d}.")
The number of records for accepted loans is 27,648,741.
for dtype in [np.number, "string"]:
summary = summarize_loan_data(rej_loan_data, dtype)
display(style_loan_summary(summary))
| count | mean | std | min | 25% | 50% | 75% | max | data type | |
|---|---|---|---|---|---|---|---|---|---|
| Amount Requested | 27,648,741 | 13,133.2 | 15,009.6 | 0.0 | 4,800.0 | 10,000.0 | 20,000.0 | 1,400,000.0 | Float64 |
| Risk_Score | 9,151,111 | 628.2 | 89.9 | 0.0 | 591.0 | 637.0 | 675.0 | 990.0 | Float64 |
| Debt-To-Income Ratio | 27,648,741 | 143.3 | 10,539.2 | -1.0 | 8.1 | 20.0 | 36.6 | 50,000,031.5 | Float64 |
| count | unique | top | freq | data type | |
|---|---|---|---|---|---|
| Application Date | 27,648,741 | 4,238 | 2018-12-04 | 42,112 | string |
| Loan Title | 27,647,436 | 73,927 | Debt consolidation | 6,418,016 | string |
| Zip Code | 27,648,448 | 1,001 | 112xx | 267,102 | string |
| State | 27,648,719 | 51 | CA | 3,242,169 | string |
| Employment Length | 26,697,386 | 11 | < 1 year | 22,994,315 | string |
| Policy Code | 27,647,823 | 2 | 0.0 | 27,559,694 | string |
Explore features¶
policy_code / Policy Code¶
What do the columns policy_code (for accepted loans) and Policy Code (for rejected
loans) refer to?
From "What are these Policy Code 2 Loans at Lending Club?":
- These [Policy Code 2 loans] are loans made to borrowers that do not meet Lending Club’s current credit policy standards.
- The FICO scores on these borrowers are typically 640-659, below the 660 threshold on Policy Code 1 loans.
- These loans are made available to select institutional investors who have a great deal of experience with consumer loans in this credit spectrum and with Lending Club.
policy_code_counts = get_value_counts(acc_loan_data["policy_code"])
display(style_value_counts(policy_code_counts))
| count | |
|---|---|
| policy_code | |
| 1.0 | 2,260,668 |
| <NA> | 33 |
policy_code_counts_rej = get_value_counts(rej_loan_data["Policy Code"])
display(style_value_counts(policy_code_counts_rej))
| count | |
|---|---|
| Policy Code | |
| 0.0 | 27,559,694 |
| 2.0 | 88,129 |
| <NA> | 918 |
loan_status¶
What are the distinct values for the column loan_status?
loan_status_counts = get_value_counts(acc_loan_data["loan_status"])
display(style_value_counts(loan_status_counts))
| count | |
|---|---|
| loan_status | |
| Fully Paid | 1,076,751 |
| Current | 878,317 |
| Charged Off | 268,559 |
| Late (31-120 days) | 21,467 |
| In Grace Period | 8,436 |
| Late (16-30 days) | 4,349 |
| Does not meet the credit policy. Status:Fully Paid | 1,988 |
| Does not meet the credit policy. Status:Charged Off | 761 |
| Default | 40 |
| <NA> | 33 |
The 33 rows that have NA for loan_status also have NA for all other features other
than id, so these rows can be filtered from the data.
From the values of id displayed in the output of next cell, these null rows appear to
be associated with the policy code.
missing_status = acc_loan_data[acc_loan_data["loan_status"].isna()]
display(missing_status.head(4).transpose())
| 421095 | 421096 | 528961 | 528962 | |
|---|---|---|---|---|
| id | Total amount funded in policy code 1: 6417608175 | Total amount funded in policy code 2: 1944088810 | Total amount funded in policy code 1: 1741781700 | Total amount funded in policy code 2: 564202131 |
| loan_amnt | <NA> | <NA> | <NA> | <NA> |
| funded_amnt | <NA> | <NA> | <NA> | <NA> |
| funded_amnt_inv | <NA> | <NA> | <NA> | <NA> |
| term | <NA> | <NA> | <NA> | <NA> |
| ... | ... | ... | ... | ... |
| settlement_status | NaN | NaN | NaN | NaN |
| settlement_date | NaN | NaN | NaN | NaN |
| settlement_amount | NaN | NaN | NaN | NaN |
| settlement_percentage | NaN | NaN | NaN | NaN |
| settlement_term | NaN | NaN | NaN | NaN |
150 rows × 4 columns
As a check, verify that if the id column is dropped, then all values are NA in rows
that are missing loan_status.
display(missing_status.drop("id", axis="columns").count().sum())
0
Create a dataframe that has these empty rows filtered out. After additional filtering, this dataframe will be used to recreate the SQLite database.
filtered_loan_data = acc_loan_data[acc_loan_data["loan_status"].notna()]
loan_status_counts = get_value_counts(filtered_loan_data["loan_status"])
display(style_value_counts(loan_status_counts))
| count | |
|---|---|
| loan_status | |
| Fully Paid | 1,076,751 |
| Current | 878,317 |
| Charged Off | 268,559 |
| Late (31-120 days) | 21,467 |
| In Grace Period | 8,436 |
| Late (16-30 days) | 4,349 |
| Does not meet the credit policy. Status:Fully Paid | 1,988 |
| Does not meet the credit policy. Status:Charged Off | 761 |
| Default | 40 |
Note that after the rows with missing loan_status have been filtered out, there are no
missing values for policy_code. Since all rows have the same value for policy_code,
this column can be dropped.
policy_code_counts = get_value_counts(filtered_loan_data["policy_code"])
display(style_value_counts(policy_code_counts))
| count | |
|---|---|
| policy_code | |
| 1.0 | 2,260,668 |
Most of the values for loan_status are explained at "What Do the Different Note Statuses
Mean?".
However, the values Does not meet the credit policy. Status:Fully Paid and Does not meet the credit policy. Status:Charged Off are unclear. Let's take look at a random
sample of the rows that have these value of loan status.
bool_index = filtered_loan_data["loan_status"].str.endswith("Status:Fully Paid")
sampled_data = filtered_loan_data[bool_index].sample(
n=5, random_state=59147, axis="index"
)
with pd.option_context("display.max_columns", None):
display(sampled_data)
| id | loan_amnt | funded_amnt | funded_amnt_inv | term | int_rate | installment | grade | sub_grade | emp_title | emp_length | home_ownership | annual_inc | verification_status | issue_d | loan_status | pymnt_plan | url | desc | purpose | title | zip_code | addr_state | dti | delinq_2yrs | earliest_cr_line | fico_range_low | fico_range_high | inq_last_6mths | mths_since_last_delinq | mths_since_last_record | open_acc | pub_rec | revol_bal | revol_util | total_acc | initial_list_status | out_prncp | out_prncp_inv | total_pymnt | total_pymnt_inv | total_rec_prncp | total_rec_int | total_rec_late_fee | recoveries | collection_recovery_fee | last_pymnt_d | last_pymnt_amnt | next_pymnt_d | last_credit_pull_d | last_fico_range_high | last_fico_range_low | collections_12_mths_ex_med | mths_since_last_major_derog | policy_code | application_type | annual_inc_joint | dti_joint | verification_status_joint | acc_now_delinq | tot_coll_amt | tot_cur_bal | open_acc_6m | open_act_il | open_il_12m | open_il_24m | mths_since_rcnt_il | total_bal_il | il_util | open_rv_12m | open_rv_24m | max_bal_bc | all_util | total_rev_hi_lim | inq_fi | total_cu_tl | inq_last_12m | acc_open_past_24mths | avg_cur_bal | bc_open_to_buy | bc_util | chargeoff_within_12_mths | delinq_amnt | mo_sin_old_il_acct | mo_sin_old_rev_tl_op | mo_sin_rcnt_rev_tl_op | mo_sin_rcnt_tl | mort_acc | mths_since_recent_bc | mths_since_recent_bc_dlq | mths_since_recent_inq | mths_since_recent_revol_delinq | num_accts_ever_120_pd | num_actv_bc_tl | num_actv_rev_tl | num_bc_sats | num_bc_tl | num_il_tl | num_op_rev_tl | num_rev_accts | num_rev_tl_bal_gt_0 | num_sats | num_tl_120dpd_2m | num_tl_30dpd | num_tl_90g_dpd_24m | num_tl_op_past_12m | pct_tl_nvr_dlq | percent_bc_gt_75 | pub_rec_bankruptcies | tax_liens | tot_hi_cred_lim | total_bal_ex_mort | total_bc_limit | total_il_high_credit_limit | revol_bal_joint | sec_app_fico_range_low | sec_app_fico_range_high | sec_app_earliest_cr_line | sec_app_inq_last_6mths | sec_app_mort_acc | sec_app_open_acc | sec_app_revol_util | sec_app_open_act_il | sec_app_num_rev_accts | sec_app_chargeoff_within_12_mths | sec_app_collections_12_mths_ex_med | sec_app_mths_since_last_major_derog | hardship_flag | hardship_type | hardship_reason | hardship_status | deferral_term | hardship_amount | hardship_start_date | hardship_end_date | payment_plan_start_date | hardship_length | hardship_dpd | hardship_loan_status | orig_projected_additional_accrued_interest | hardship_payoff_balance_amount | hardship_last_payment_amount | disbursement_method | debt_settlement_flag | debt_settlement_flag_date | settlement_status | settlement_date | settlement_amount | settlement_percentage | settlement_term | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1653100 | 389258 | 5000.0 | 5000.0 | 4688.548225 | 36 | 16.0 | 175.79 | E | E2 | Mustang Harrys Restaurant | 2 years | RENT | 21000.0 | Not Verified | 2009-04 | Does not meet the credit policy. Status:Fully ... | False | https://lendingclub.com/browse/loanDetail.acti... | i am trying to buy a car and pay for the insur... | major_purchase | personal | 104xx | NY | 14.0 | 0 | 2006-11 | 680 | 684 | 2 | <NA> | <NA> | 14 | 0 | 6348.0 | 52.0 | 18 | f | 0.0 | 0.0 | 6328.258462 | 5943.66 | 4999.99 | 1328.26 | 0.0 | 0.0 | 0.0 | 2012-04 | 197.99 | 2012-05 | 2012-04 | 714 | 710 | 0 | <NA> | 1.0 | Individual | <NA> | <NA> | <NA> | 0 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | 0 | 0.0 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | 0 | 0 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | False | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | Cash | False | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> |
| 1652370 | 507550 | 6400.0 | 6400.0 | 6400.0 | 36 | 12.73 | 214.83 | C | C1 | Advanced Cash Register Systems Inc | 1 year | MORTGAGE | 43000.0 | Not Verified | 2010-04 | Does not meet the credit policy. Status:Fully ... | False | https://lendingclub.com/browse/loanDetail.acti... | Borrower added on 04/22/10 > Our main object... | debt_consolidation | Debt Consolidation loan | 531xx | WI | 8.4 | 0 | 1994-09 | 695 | 699 | 4 | <NA> | 80 | 7 | 1 | 4686.0 | 27.7 | 30 | f | 0.0 | 0.0 | 7734.435939 | 7734.44 | 6400.0 | 1334.44 | 0.0 | 0.0 | 0.0 | 2013-05 | 244.7 | 2013-06 | 2019-03 | 654 | 650 | 0 | <NA> | 1.0 | Individual | <NA> | <NA> | <NA> | 0 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | 0 | 0.0 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | 0 | 0 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | False | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | Cash | False | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> |
| 1652294 | 522251 | 6000.0 | 6000.0 | 6000.0 | 60 | 19.04 | 155.78 | F | F3 | Blueline Software Services | 1 year | MORTGAGE | 60000.0 | Not Verified | 2010-05 | Does not meet the credit policy. Status:Fully ... | False | https://lendingclub.com/browse/loanDetail.acti... | Borrower added on 05/26/10 > I plan on using... | other | Financial Freedom | 070xx | NJ | 9.54 | 1 | 1999-05 | 675 | 679 | 4 | 14 | <NA> | 18 | 0 | 7376.0 | 53.8 | 28 | f | 0.0 | 0.0 | 7846.999584 | 7847.0 | 6000.0 | 1847.0 | 0.0 | 0.0 | 0.0 | 2012-04 | 4581.37 | 2012-05 | 2012-04 | 624 | 620 | 0 | <NA> | 1.0 | Individual | <NA> | <NA> | <NA> | 0 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | 0 | 0.0 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | 0 | 0 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | False | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | Cash | False | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> |
| 1651865 | 601025 | 2800.0 | 2800.0 | 2800.0 | 36 | 14.46 | 96.33 | D | D2 | Plasma Biological Services | 3 years | OWN | 41000.0 | Source Verified | 2010-10 | Does not meet the credit policy. Status:Fully ... | False | https://lendingclub.com/browse/loanDetail.acti... | Borrower added on 10/19/10 > Lenders:<br/>I ... | debt_consolidation | Good Bye Credit Card Debt! | 646xx | MO | 20.81 | 1 | 2003-08 | 665 | 669 | 6 | 11 | <NA> | 20 | 0 | 3442.0 | 35.5 | 24 | f | 0.0 | 0.0 | 3464.22895 | 3464.23 | 2800.0 | 664.23 | 0.0 | 0.0 | 0.0 | 2013-09 | 285.98 | 2013-10 | 2019-03 | 664 | 660 | 0 | <NA> | 1.0 | Individual | <NA> | <NA> | <NA> | 0 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | 0 | 0.0 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | 0 | 0 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | False | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | Cash | False | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> |
| 1652535 | 489091 | 14725.0 | 14725.0 | 14725.0 | 36 | 14.22 | 504.84 | C | C5 | TradeStation Securities | 1 year | MORTGAGE | 60000.0 | Not Verified | 2010-03 | Does not meet the credit policy. Status:Fully ... | False | https://lendingclub.com/browse/loanDetail.acti... | Borrower added on 02/25/10 > 1. I am employe... | debt_consolidation | You make money while I save on interest | 330xx | FL | 9.09 | 0 | 2005-10 | 700 | 704 | 4 | <NA> | <NA> | 11 | 0 | 226.0 | 3.4 | 14 | f | 0.0 | 0.0 | 18174.993854 | 18174.99 | 14724.99 | 3450.0 | 0.0 | 0.0 | 0.0 | 2013-03 | 551.17 | 2013-04 | 2013-03 | 639 | 635 | 0 | <NA> | 1.0 | Individual | <NA> | <NA> | <NA> | 0 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | 0 | 0.0 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | 0 | 0 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | False | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | Cash | False | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> |
bool_index = filtered_loan_data["loan_status"].str.endswith("Status:Charged Off")
sampled_data = filtered_loan_data[bool_index].sample(
n=5, random_state=59147, axis="index"
)
with pd.option_context("display.max_columns", None):
display(sampled_data)
| id | loan_amnt | funded_amnt | funded_amnt_inv | term | int_rate | installment | grade | sub_grade | emp_title | emp_length | home_ownership | annual_inc | verification_status | issue_d | loan_status | pymnt_plan | url | desc | purpose | title | zip_code | addr_state | dti | delinq_2yrs | earliest_cr_line | fico_range_low | fico_range_high | inq_last_6mths | mths_since_last_delinq | mths_since_last_record | open_acc | pub_rec | revol_bal | revol_util | total_acc | initial_list_status | out_prncp | out_prncp_inv | total_pymnt | total_pymnt_inv | total_rec_prncp | total_rec_int | total_rec_late_fee | recoveries | collection_recovery_fee | last_pymnt_d | last_pymnt_amnt | next_pymnt_d | last_credit_pull_d | last_fico_range_high | last_fico_range_low | collections_12_mths_ex_med | mths_since_last_major_derog | policy_code | application_type | annual_inc_joint | dti_joint | verification_status_joint | acc_now_delinq | tot_coll_amt | tot_cur_bal | open_acc_6m | open_act_il | open_il_12m | open_il_24m | mths_since_rcnt_il | total_bal_il | il_util | open_rv_12m | open_rv_24m | max_bal_bc | all_util | total_rev_hi_lim | inq_fi | total_cu_tl | inq_last_12m | acc_open_past_24mths | avg_cur_bal | bc_open_to_buy | bc_util | chargeoff_within_12_mths | delinq_amnt | mo_sin_old_il_acct | mo_sin_old_rev_tl_op | mo_sin_rcnt_rev_tl_op | mo_sin_rcnt_tl | mort_acc | mths_since_recent_bc | mths_since_recent_bc_dlq | mths_since_recent_inq | mths_since_recent_revol_delinq | num_accts_ever_120_pd | num_actv_bc_tl | num_actv_rev_tl | num_bc_sats | num_bc_tl | num_il_tl | num_op_rev_tl | num_rev_accts | num_rev_tl_bal_gt_0 | num_sats | num_tl_120dpd_2m | num_tl_30dpd | num_tl_90g_dpd_24m | num_tl_op_past_12m | pct_tl_nvr_dlq | percent_bc_gt_75 | pub_rec_bankruptcies | tax_liens | tot_hi_cred_lim | total_bal_ex_mort | total_bc_limit | total_il_high_credit_limit | revol_bal_joint | sec_app_fico_range_low | sec_app_fico_range_high | sec_app_earliest_cr_line | sec_app_inq_last_6mths | sec_app_mort_acc | sec_app_open_acc | sec_app_revol_util | sec_app_open_act_il | sec_app_num_rev_accts | sec_app_chargeoff_within_12_mths | sec_app_collections_12_mths_ex_med | sec_app_mths_since_last_major_derog | hardship_flag | hardship_type | hardship_reason | hardship_status | deferral_term | hardship_amount | hardship_start_date | hardship_end_date | payment_plan_start_date | hardship_length | hardship_dpd | hardship_loan_status | orig_projected_additional_accrued_interest | hardship_payoff_balance_amount | hardship_last_payment_amount | disbursement_method | debt_settlement_flag | debt_settlement_flag_date | settlement_status | settlement_date | settlement_amount | settlement_percentage | settlement_term | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1654218 | 145926 | 10475.0 | 10475.0 | 5925.003199 | 36 | 14.38 | 359.95 | E | E4 | Sampson Community College | 5 years | OWN | 76500.0 | Not Verified | 2007-11 | Does not meet the credit policy. Status:Charge... | False | https://lendingclub.com/browse/loanDetail.acti... | I would like to consolidate credit card and lo... | credit_card | Harold | 283xx | NC | 9.98 | 0 | 1994-12 | 640 | 644 | 6 | 71 | 0 | 9 | 0 | 15918.0 | 101.4 | 16 | f | 0.0 | 0.0 | 6763.02 | 3823.8 | 4662.27 | 1806.22 | 29.962676 | 264.57 | 2.73 | 2009-06 | 359.95 | 2009-12 | 2017-02 | 504 | 500 | 0 | <NA> | 1.0 | Individual | <NA> | <NA> | <NA> | 0 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | 0 | 0.0 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | 0 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | False | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | Cash | False | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> |
| 1652310 | 519096 | 10000.0 | 10000.0 | 9975.0 | 60 | 17.93 | 253.56 | E | E5 | WORLD MICRO | 4 years | OWN | 70000.0 | Not Verified | 2010-05 | Does not meet the credit policy. Status:Charge... | False | https://lendingclub.com/browse/loanDetail.acti... | Borrower added on 05/17/10 > I am looking to... | debt_consolidation | VETERAN CREDIT CARD RELIEF | 301xx | GA | 10.8 | 0 | 1998-11 | 665 | 669 | 5 | 57 | 64 | 7 | 1 | 11838.0 | 56.1 | 15 | f | 0.0 | 0.0 | 13902.02 | 13867.36 | 8538.74 | 5123.97 | 0.0 | 239.31 | 37.7604 | 2014-12 | 253.56 | 2015-05 | 2016-10 | 589 | 585 | 0 | <NA> | 1.0 | Individual | <NA> | <NA> | <NA> | 0 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | 0 | 0.0 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | 0 | 0 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | False | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | Cash | False | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> |
| 1651959 | 584984 | 10000.0 | 10000.0 | 10000.0 | 36 | 13.61 | 339.89 | C | C2 | <NA> | 4 years | MORTGAGE | 57600.0 | Verified | 2010-09 | Does not meet the credit policy. Status:Charge... | False | https://lendingclub.com/browse/loanDetail.acti... | Borrower added on 09/22/10 > I will use the ... | debt_consolidation | personal loan | 324xx | FL | 9.33 | 0 | 2000-11 | 695 | 699 | 4 | <NA> | <NA> | 19 | 0 | 7162.0 | 31.6 | 28 | f | 0.0 | 0.0 | 3059.85 | 3059.85 | 2121.23 | 920.59 | 0.0 | 18.03 | 5.5 | 2011-07 | 339.89 | 2011-11 | 2019-03 | 634 | 630 | 0 | <NA> | 1.0 | Individual | <NA> | <NA> | <NA> | 0 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | 0 | 0.0 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | 0 | 0 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | False | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | Cash | False | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> |
| 1652322 | 517699 | 6000.0 | 6000.0 | 5975.0 | 60 | 17.93 | 152.14 | E | E5 | Sepaton | 4 years | RENT | 103000.0 | Not Verified | 2010-05 | Does not meet the credit policy. Status:Charge... | False | https://lendingclub.com/browse/loanDetail.acti... | Borrower added on 05/14/10 > Expanding our i... | small_business | Business expansion | 018xx | MA | 7.07 | 1 | 1988-09 | 665 | 669 | 6 | 10 | <NA> | 5 | 0 | 54.0 | 13.5 | 10 | f | 0.0 | 0.0 | 4920.97 | 4900.53 | 2433.82 | 2268.26 | 0.0 | 218.89 | 2.18 | 2013-01 | 152.14 | 2013-05 | 2016-10 | 519 | 515 | 0 | <NA> | 1.0 | Individual | <NA> | <NA> | <NA> | 0 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | 0 | 0.0 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | 0 | 0 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | False | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | Cash | False | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> |
| 1652419 | 501230 | 15000.0 | 15000.0 | 14628.44 | 36 | 14.59 | 516.98 | D | D1 | Van-Dee Manufacturing Co. | 10+ years | MORTGAGE | 65000.0 | Verified | 2010-04 | Does not meet the credit policy. Status:Charge... | False | https://lendingclub.com/browse/loanDetail.acti... | Borrower added on 04/05/10 > Small business ... | small_business | High Profile new business loan | 605xx | IL | 19.44 | 0 | 1989-07 | 710 | 714 | 7 | <NA> | 115 | 14 | 1 | 51322.0 | 44.2 | 26 | f | 0.0 | 0.0 | 3101.88 | 3029.48 | 2067.04 | 1030.94 | 0.0 | 3.9 | 0.0 | 2010-10 | 516.98 | 2011-03 | 2019-03 | 664 | 660 | 0 | <NA> | 1.0 | Individual | <NA> | <NA> | <NA> | 0 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | 0 | 0.0 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | 1 | 0 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | False | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | Cash | False | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> |
Nothing jumps out from this small random sample. Rather than trying to guess why certain rows do not meet the credit policy, I'll exclude these rows.
bool_index = filtered_loan_data["loan_status"].str.startswith("Does not meet")
filtered_loan_data = filtered_loan_data[~bool_index]
loan_status_counts = get_value_counts(filtered_loan_data["loan_status"])
display(style_value_counts(loan_status_counts))
| count | |
|---|---|
| loan_status | |
| Fully Paid | 1,076,751 |
| Current | 878,317 |
| Charged Off | 268,559 |
| Late (31-120 days) | 21,467 |
| In Grace Period | 8,436 |
| Late (16-30 days) | 4,349 |
| Default | 40 |
issue_d¶
The description of this feature is "The month which the loan was funded".
After rows with problematic values of loan_status have been filtered out, there are no
missing values for issue_d.
filtered_loan_data["issue_d"].isna().sum()
0
to_plot = get_group_sizes(filtered_loan_data, group_by="issue_d")
fig = px.line(
to_plot,
x="issue_d",
y="count",
markers=True,
labels={"issue_d": "Loan date", "count": "Number of loans"},
hover_data={"count": ":.3s"},
title="Number of accepted loans by date",
)
fig.show()
I will exclude pre-2012 dates from the analysis.
Analysis and prediction based on this data will need to take account of changes in behavior over time, and given the relatively small number of loans issued before 2012, it is not worthwhile to include the pre-2012 data.
bool_index = filtered_loan_data["issue_d"] >= "2012-01"
filtered_loan_data = filtered_loan_data[bool_index]
to_plot = get_group_sizes(filtered_loan_data, group_by="issue_d")
fig = px.line(
to_plot,
x="issue_d",
y="count",
markers=True,
labels={"issue_d": "Loan date", "count": "Number of loans"},
hover_data={"count": ":.3s"},
title="Number of accepted loans by date",
)
fig.show()
loan_amnt / funded_amnt / funded_amnt_inv¶
What is the distinction between loan_amnt, funded_amnt, funded_amnt_inv?
Start by examining the feature descriptions.
amount_features = acc_loan_feat_desc.loc[
["loan_amnt", "funded_amnt", "funded_amnt_inv"], ["description"]
]
display(style_loan_summary(amount_features))
| description | |
|---|---|
| column name | |
| loan_amnt | The listed amount of the loan applied for by the borrower. If at some point in time, the credit department reduces the loan amount, then it will be reflected in this value. |
| funded_amnt | The total amount committed to that loan at that point in time. |
| funded_amnt_inv | The total amount committed by investors for that loan at that point in time. |
What should we infer in cases where loan_amnt is different than funded_amnt, or in
cases where funded_amnt is different than funded_amnt_inv? It's not completely
clear from these descriptions.
Take a closer look at these cases.
# First check for missing values.
for column_name in ["loan_amnt", "funded_amnt", "funded_amnt_inv"]:
na_count = filtered_loan_data[column_name].isna().sum()
print(f'\nThe number of missing values for feature "{column_name}" is {na_count}.')
The number of missing values for feature "loan_amnt" is 0. The number of missing values for feature "funded_amnt" is 0. The number of missing values for feature "funded_amnt_inv" is 0.
bool_index = (filtered_loan_data["loan_amnt"] - filtered_loan_data["funded_amnt"]) != 0
print(
'\nThe number of loans with "loan_amnt" different than "funded_amnt" is '
f"{sum(bool_index)}.\n"
)
partially_funded_loans = filtered_loan_data[bool_index]
The number of loans with "loan_amnt" different than "funded_amnt" is 68.
funding_gap = (
partially_funded_loans["loan_amnt"] - partially_funded_loans["funded_amnt"]
)
funding_gap.name = "funding_gap"
funding_gap.to_frame().describe()
| funding_gap | |
|---|---|
| count | 68.0 |
| mean | 7384.191176 |
| std | 4779.019572 |
| min | 25.0 |
| 25% | 3843.75 |
| 50% | 7062.5 |
| 75% | 10281.25 |
| max | 21150.0 |
The differences between funded_amnt and loan_amnt range from \$25 to over \$21k.
This is consistent with the description of loan_amnt as the amount requested by the
borrower.
to_plot = get_group_sizes(partially_funded_loans, group_by="issue_d")
fig = px.scatter(
to_plot,
x="issue_d",
y="count",
labels={"issue_d": "Loan date", "count": "Number of loans"},
hover_data={"count": ":,d"},
title='Date of loans with "loan_amnt" different than "funded_amnt"',
)
fig.show()
bool_index = (
filtered_loan_data["funded_amnt"] - filtered_loan_data["funded_amnt_inv"]
) != 0
print(
'\nThe number of loans with "funded_amnt" different than "funded_amnt_inv" is '
f"{sum(bool_index)}.\n"
)
partially_funded_by_investors = filtered_loan_data[bool_index]
The number of loans with "funded_amnt" different than "funded_amnt_inv" is 129094.
investor_funding_gap = (
partially_funded_by_investors["funded_amnt"]
- partially_funded_by_investors["funded_amnt_inv"]
)
investor_funding_gap.name = "investor_funding_gap"
investor_funding_gap.to_frame().describe()
| investor_funding_gap | |
|---|---|
| count | 129094.0 |
| mean | 94.58058 |
| std | 173.332427 |
| min | 0.346054 |
| 25% | 25.0 |
| 50% | 50.0 |
| 75% | 100.0 |
| max | 28925.0 |
The differences between funded_amnt_inv and funded_amnt range from less than a
dollar to around \$29k.
The fact that the difference is always positive is consistent with the descriptions of these two features.
It's a little surprising to see the value \$0.346054, since the value is specified to a
small fraction of a cent. Examine the values of funded_amnt and funded_amnt_inv in
cases where the difference between them is less than a dollar.
to_check = partially_funded_by_investors[["funded_amnt", "funded_amnt_inv"]].assign(
investor_funding_gap=investor_funding_gap
)
to_check[to_check["investor_funding_gap"] < 1]
| funded_amnt | funded_amnt_inv | investor_funding_gap | |
|---|---|---|---|
| 523236 | 14000.0 | 13999.653946 | 0.346054 |
| 1014559 | 35000.0 | 34999.573964 | 0.426036 |
| 1040719 | 28000.0 | 27999.0923 | 0.9077 |
| 1481021 | 19200.0 | 19199.516656 | 0.483344 |
| 1910970 | 7850.0 | 7849.445596 | 0.554404 |
| 1913361 | 24375.0 | 24374.467907 | 0.532093 |
| 1913665 | 9000.0 | 8999.386905 | 0.613095 |
| 1913748 | 14125.0 | 14124.448796 | 0.551204 |
| 1913864 | 24925.0 | 24924.456741 | 0.543259 |
| 1914004 | 21000.0 | 20999.448797 | 0.551203 |
It appears that in cases, funded_amnt_inv was specified with excessive precision.
to_plot = get_group_sizes(partially_funded_by_investors, group_by="issue_d")
fig = px.scatter(
to_plot,
x="issue_d",
y="count",
labels={"issue_d": "Loan date", "count": "Number of loans"},
hover_data={"count": ":,d"},
title='Date of loans with "funded_amnt" different than "funded_amnt_inv"',
)
fig.show()
Discussion:
- Only 68 of the 2.2 million loans have
loan_amntdifferent thanfunded_amnt. Essentially all the loans are fully funded. - About 130k of the loans have different values for
funded_amntandfunded_amnt_inv. Is LendingClub itself providing funding in these case? - The funding gaps expressed as (
loan_amnt-funded_amnt) and (funded_amnt-funded_amnt_inv) are always positive, as expected from the descriptions of the three features.
While I don't understand the cause of the differences between loan_amnt,
funded_amnt, and funded_amnt_inv, I won't filter out the rows with different values
for these features. Unlike the rows where loan_status includes the string "Does not meet the credit policy", there isn't a strong indication that rows with different
values for loan_amnt, funded_amnt, and funded_amnt_inv are fundamentally
problematic.
initial_list_status¶
The feature initial_list_status is explained in this blog
post:
The variable initial_list_status is available in the public data and identifies whether a loan was initially listed in the whole (W) or fractional (F) market. Loans listed “whole” become available for fractional funding (and vice versa) if there are no buyers within a certain time frame.
list_status_counts = get_value_counts(filtered_loan_data["initial_list_status"])
display(style_value_counts(list_status_counts))
| count | |
|---|---|
| initial_list_status | |
| w | 1,535,467 |
| f | 682,666 |
Given this explanation of the feature initial_list_status, there's no need to drop the
feature or filter out rows based on the value of the feature
int_rate¶
The notebook analysis-01.ipynb presents an in-depth analysis of interest rates for the
accepted loans. While doing that analysis, I found that some loans had an anomalously
low interest rate, given the poor grade assigned to the loans by LendingClub.
Here I present analysis showing that some interest rates are anomalously low, and I explore possible explanations. Since none of the explanations are well supported by the data, I filter out the corresponding loans.
Note on the code for visualization: I use plotly histograms to highlight the anomalous interest rates, and the binning for these histograms needs to be done outside of plotly. The reason is that plotly does binning in JavaScript, and so unbinned data passed to plotly's histogram function becomes part of the javascript code stored with the notebook. For the current data set, this can increase the notebook size on disk by a factor of more than 100.
min = filtered_loan_data["int_rate"].min()
max = filtered_loan_data["int_rate"].max()
print(
'The minimum and maximum values of "int_rate" '
f"are {min} and {max}, respectively."
)
The minimum and maximum values of "int_rate" are 5.31 and 30.99, respectively.
# Define arrays / lists needed for binning the histogram and plotting the bins in
# plotly.
int_rate_bins = np.linspace(5, 31, num=27)
int_rate_bin_labels = [f"{left:d}% - {left+0.99:.2f}%" for left in range(5, 31)]
int_rate_tick_vals = int_rate_bin_labels[0::5]
int_rate_tick_text = [f"{left}%" for left in range(5, 35, 5)]
filtered_loan_data["int_rate_bin"] = pd.cut(
filtered_loan_data["int_rate"],
bins=int_rate_bins,
labels=int_rate_bin_labels,
right=False,
)
to_plot = get_group_sizes(filtered_loan_data, group_by="int_rate_bin")
fig = px.bar(
to_plot,
x="int_rate_bin",
y="count",
labels={"count": "Number of loans", "int_rate_bin": "Interest rate"},
title="Distribution of loan interest rate",
)
customdata = to_plot["int_rate_bin"]
hovertemplate = "Interest rate=%{customdata}<br>Number of loans=%{y:.3s}<extra></extra>"
fig.update_traces(customdata=customdata, hovertemplate=hovertemplate)
fig.update_layout(bargap=0)
fig.update_xaxes(
tickmode="array", tickvals=int_rate_tick_vals, ticktext=int_rate_tick_text
)
fig.show()
to_plot = get_group_sizes(filtered_loan_data, group_by=["int_rate_bin", "grade"])
fig = px.bar(
to_plot,
x="int_rate_bin",
y="count",
facet_row="grade",
labels={
"count": "Number of loans",
"int_rate_bin": "Interest rate",
"grade": "Grade",
},
title="Distribution of loan interest rate by loan grade",
hover_data={"count": ":.3s"},
height=1200,
)
fig.update_xaxes(tickmode="array", tickvals=int_rate_tick_vals, tickangle=45)
fig.update_yaxes(matches=None, title="")
fig.update_layout(bargap=0, yaxis4_title="Number of Loans")
fig.show()
to_plot = get_group_sizes(
filtered_loan_data, group_by=["int_rate_bin", "grade", "sub_grade"]
)
to_plot["sub_grade"] = to_plot["sub_grade"].str[1]
fig = px.bar(
to_plot,
x="int_rate_bin",
y="count",
facet_row="grade",
color="sub_grade",
labels={
"count": "Number of loans",
"int_rate_bin": "Interest rate",
"grade": "Grade",
"sub_grade": "Sub-grade",
},
title="Distribution of loan interest rate by loan grade and sub-grade",
hover_data={"count": ":.3s"},
height=1200,
)
fig.update_xaxes(tickmode="array", tickvals=int_rate_tick_vals, tickangle=45)
fig.update_yaxes(matches=None, title="")
fig.update_layout(bargap=0, yaxis4_title="Number of Loans")
fig.show()
The previous plots show the following:
- The interest rate varies systematically with the loan grade.
- For loans with a high grade, the interest rate varies systemtically with loan sub-grade.
- For loans with a low grade, the dependence of interest rate on sub-grade is complex.
To get a different view of these patterns, I'll set the y-axis to show percentage of loans rather than number of loans in each category.
to_plot = get_group_sizes(
filtered_loan_data, group_by=["int_rate_bin", "grade", "sub_grade"]
)
to_plot["sub_grade"] = to_plot["sub_grade"].str[1]
fig = px.histogram(
to_plot,
x="int_rate_bin",
y="count",
facet_row="grade",
color="sub_grade",
labels={
"count": "Number of loans",
"int_rate_bin": "Interest rate",
"grade": "Grade",
"sub_grade": "Sub-grade",
},
barnorm="fraction",
title="Distribution of loan interest rate by loan grade and sub-grade",
height=1200,
)
def clean_up_hovertemplate(trace):
trace.hovertemplate = trace.hovertemplate.replace(
"sum of Number of loans (normalized as fraction)", "Percentage"
)
fig.for_each_trace(clean_up_hovertemplate)
fig.update_xaxes(tickmode="array", tickvals=int_rate_tick_vals, tickangle=45)
fig.update_yaxes(title="", tickformat=".0%")
fig.update_layout(bargap=0, yaxis4_title="Number of loans")
fig.show()
The plot above shows the anomalous interest rates. For example, most of the loans in the bottom row, which correspond to the lowest grade, have an interest rate of at least 22%, but the plot also shows a block of loans in the bottom row with interest rate in the range of 6% - 6.99%.
Possible explanations that I explored:
- These rates are associated with a hardship plan or settlement plan.
- The funding mechanism for these loans is unusual.
- These loans are associated with some external event and were all issued at around the same time.
- An improvement in the borrower's FICO score may have caused the interest rate to be lowered without changing the assigned loan grade.
As shown below, I could not find support in the data for any of these guesses.
First select all of anomalous cases with interest rate in the range of 6% - 6.99%. There are 625 of these loans.
anomalous_int_rate_data = filtered_loan_data[
(filtered_loan_data["grade"] != "A") & (filtered_loan_data["int_rate"] < 7)
]
print(f"Number of loans: {len(anomalous_int_rate_data.index)}.")
Number of loans: 625.
Check the category of loans with grade 'D' and interest rate 8% - 8.99% shown in the plot above. There is only 1 loan in this category.
bool_index = (filtered_loan_data["grade"] == "D") & (
filtered_loan_data["int_rate"].between(7.99, 9, inclusive="neither")
)
print(f"Number of loans: {sum(bool_index)}.")
Number of loans: 1.
I will focus on the anomalous cases with interest rate in the range of 6% - 6.99%.
All of these loans have the same rate: 6.0%.
anomalous_rates = list(anomalous_int_rate_data["int_rate"].unique())
print(f"Distinct values of interest rate: {anomalous_rates}")
Distinct values of interest rate: [6.0]
Only a few are associated with a hardship plan or settlement plan.
display(
style_value_counts(get_value_counts(anomalous_int_rate_data["hardship_status"]))
)
display(
style_value_counts(get_value_counts(anomalous_int_rate_data["settlement_term"]))
)
| count | |
|---|---|
| hardship_status | |
| <NA> | 621 |
| COMPLETED | 3 |
| BROKEN | 1 |
| count | |
|---|---|
| settlement_term | |
| <NA> | 622 |
| 1 | 2 |
| 18 | 1 |
All of these loans are fully funded. For 46 of the loans, the value of funded_amnt is
different from the value of funded_amnt_inv, but most of the loans are fully funded by
investors.
bool_index = (
anomalous_int_rate_data["loan_amnt"] - anomalous_int_rate_data["funded_amnt"]
) != 0
print(f"The number of these loans that are not fully funded is {sum(bool_index)}.")
The number of these loans that are not fully funded is 0.
bool_index = (
anomalous_int_rate_data["funded_amnt"] - anomalous_int_rate_data["funded_amnt_inv"]
) != 0
print(
'The number of these loans with "funded_amnt" different than "funded_amnt_inv" '
f"is {sum(bool_index)}."
)
The number of these loans with "funded_amnt" different than "funded_amnt_inv" is 46.
The dates of these loans are spread over the full ranges of dates of the filtered data, so they do not appear to be associated with an external event.
to_plot = get_group_sizes(anomalous_int_rate_data, group_by="issue_d")
fig = px.scatter(
to_plot,
x="issue_d",
y="count",
labels={"issue_d": "Loan date", "count": "Number of loans"},
title="Dates of loans with anomalous interest rate 6.0%",
)
fig.show()
Look at a random sample of these records and check whether the FICO score improved from the value it had at loan origination.
# Define a reproducible random state for sampling randomly
rng = np.random.default_rng(seed=16513)
sampled_data = anomalous_int_rate_data.sample(10, random_state=rng)
fico_columns = [
"fico_range_low",
"fico_range_high",
"last_fico_range_low",
"last_fico_range_high",
]
style_loan_summary(sampled_data[fico_columns])
| fico_range_low | fico_range_high | last_fico_range_low | last_fico_range_high | |
|---|---|---|---|---|
| 2210886 | 675 | 679 | 725 | 729 |
| 50250 | 720 | 724 | 690 | 694 |
| 18516 | 690 | 694 | 685 | 689 |
| 1756456 | 660 | 664 | 625 | 629 |
| 1301075 | 695 | 699 | 700 | 704 |
| 89739 | 715 | 719 | 680 | 684 |
| 1012395 | 695 | 699 | 640 | 644 |
| 1290147 | 705 | 709 | 710 | 714 |
| 594599 | 695 | 699 | 655 | 659 |
| 2046566 | 745 | 749 | 605 | 609 |
The columns fico_range_low and fico_range_high correspond to values at loan
origination, while the columns last_fico_range_low and last_fico_range_high
correspond to a more recent check of the credit rating. This random sample does not
show a systematic jump in FICO scores for the loans with anomalously low interest rates.
Inspect the columns of the data frame for a patterns that might explain the low rates.
with pd.option_context("display.max_columns", None):
display(sampled_data)
| id | loan_amnt | funded_amnt | funded_amnt_inv | term | int_rate | installment | grade | sub_grade | emp_title | emp_length | home_ownership | annual_inc | verification_status | issue_d | loan_status | pymnt_plan | url | desc | purpose | title | zip_code | addr_state | dti | delinq_2yrs | earliest_cr_line | fico_range_low | fico_range_high | inq_last_6mths | mths_since_last_delinq | mths_since_last_record | open_acc | pub_rec | revol_bal | revol_util | total_acc | initial_list_status | out_prncp | out_prncp_inv | total_pymnt | total_pymnt_inv | total_rec_prncp | total_rec_int | total_rec_late_fee | recoveries | collection_recovery_fee | last_pymnt_d | last_pymnt_amnt | next_pymnt_d | last_credit_pull_d | last_fico_range_high | last_fico_range_low | collections_12_mths_ex_med | mths_since_last_major_derog | policy_code | application_type | annual_inc_joint | dti_joint | verification_status_joint | acc_now_delinq | tot_coll_amt | tot_cur_bal | open_acc_6m | open_act_il | open_il_12m | open_il_24m | mths_since_rcnt_il | total_bal_il | il_util | open_rv_12m | open_rv_24m | max_bal_bc | all_util | total_rev_hi_lim | inq_fi | total_cu_tl | inq_last_12m | acc_open_past_24mths | avg_cur_bal | bc_open_to_buy | bc_util | chargeoff_within_12_mths | delinq_amnt | mo_sin_old_il_acct | mo_sin_old_rev_tl_op | mo_sin_rcnt_rev_tl_op | mo_sin_rcnt_tl | mort_acc | mths_since_recent_bc | mths_since_recent_bc_dlq | mths_since_recent_inq | mths_since_recent_revol_delinq | num_accts_ever_120_pd | num_actv_bc_tl | num_actv_rev_tl | num_bc_sats | num_bc_tl | num_il_tl | num_op_rev_tl | num_rev_accts | num_rev_tl_bal_gt_0 | num_sats | num_tl_120dpd_2m | num_tl_30dpd | num_tl_90g_dpd_24m | num_tl_op_past_12m | pct_tl_nvr_dlq | percent_bc_gt_75 | pub_rec_bankruptcies | tax_liens | tot_hi_cred_lim | total_bal_ex_mort | total_bc_limit | total_il_high_credit_limit | revol_bal_joint | sec_app_fico_range_low | sec_app_fico_range_high | sec_app_earliest_cr_line | sec_app_inq_last_6mths | sec_app_mort_acc | sec_app_open_acc | sec_app_revol_util | sec_app_open_act_il | sec_app_num_rev_accts | sec_app_chargeoff_within_12_mths | sec_app_collections_12_mths_ex_med | sec_app_mths_since_last_major_derog | hardship_flag | hardship_type | hardship_reason | hardship_status | deferral_term | hardship_amount | hardship_start_date | hardship_end_date | payment_plan_start_date | hardship_length | hardship_dpd | hardship_loan_status | orig_projected_additional_accrued_interest | hardship_payoff_balance_amount | hardship_last_payment_amount | disbursement_method | debt_settlement_flag | debt_settlement_flag_date | settlement_status | settlement_date | settlement_amount | settlement_percentage | settlement_term | int_rate_bin | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2210886 | 92449578 | 25725.0 | 25725.0 | 25725.0 | 60 | 6.0 | 495.59 | C | C5 | Logistics | 10+ years | MORTGAGE | 72000.0 | Verified | 2016-11 | Current | False | https://lendingclub.com/browse/loanDetail.acti... | <NA> | debt_consolidation | Debt consolidation | 581xx | ND | 25.64 | 1 | 1994-09 | 675 | 679 | 0 | 14 | <NA> | 29 | 0 | 22715.0 | 42.4 | 49 | w | 14621.67 | 14621.67 | 14802.01 | 14802.01 | 11103.33 | 3698.68 | 0.0 | 0.0 | 0.0 | 2019-03 | 495.59 | 2019-04 | 2019-03 | 729 | 725 | 0 | <NA> | 1.0 | Joint App | 102000.0 | 24.37 | Not Verified | 0 | 0.0 | 304691.0 | 1 | 6 | 0 | 1 | 18 | 21660.0 | 63.0 | 1 | 7 | 6533.0 | 50.0 | 53600.0 | 0 | 0 | 2 | 10 | 11719.0 | 13142.0 | 59.4 | 0 | 0.0 | 265 | 119 | 3 | 3 | 3 | 21 | <NA> | 1 | <NA> | 0 | 10 | 11 | 11 | 16 | 13 | 22 | 33 | 11 | 28 | 0 | 0 | 0 | 2 | 95.9 | 41.7 | 0 | 0 | 352077.0 | 44375.0 | 32400.0 | 34546.0 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | False | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | Cash | False | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | 6% - 6.99% |
| 50250 | 65673954 | 2500.0 | 2500.0 | 2500.0 | 36 | 6.0 | 77.64 | C | C1 | courtesy associate | 10+ years | OWN | 55000.0 | Source Verified | 2015-11 | Fully Paid | False | https://lendingclub.com/browse/loanDetail.acti... | <NA> | debt_consolidation | Debt consolidation | 270xx | NC | 16.71 | 0 | 2000-06 | 720 | 724 | 0 | 58 | <NA> | 11 | 0 | 5032.0 | 29.6 | 27 | w | 0.0 | 0.0 | 2983.286179 | 2983.29 | 2500.0 | 483.29 | 0.0 | 0.0 | 0.0 | 2018-12 | 0.03 | <NA> | 2018-12 | 694 | 690 | 0 | <NA> | 1.0 | Individual | <NA> | <NA> | <NA> | 0 | 0.0 | 42504.0 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | 17000.0 | <NA> | <NA> | <NA> | 6 | 4723.0 | 3729.0 | 42.6 | 0 | 0.0 | 174 | 185 | 4 | 3 | 0 | 4 | <NA> | 17 | <NA> | 0 | 1 | 3 | 3 | 4 | 15 | 9 | 12 | 3 | 11 | 0 | 0 | 0 | 3 | 96.3 | 33.3 | 0 | 0 | 55388.0 | 42504.0 | 6500.0 | 38388.0 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | False | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | Cash | False | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | 6% - 6.99% |
| 18516 | 66024347 | 13675.0 | 13675.0 | 13675.0 | 60 | 6.0 | 272.39 | E | E4 | <NA> | <NA> | RENT | 36000.0 | Verified | 2015-12 | Fully Paid | False | https://lendingclub.com/browse/loanDetail.acti... | <NA> | credit_card | Credit card refinancing | 748xx | OK | 32.3 | 0 | 2007-04 | 690 | 694 | 0 | 43 | <NA> | 8 | 0 | 3508.0 | 25.1 | 22 | w | 0.0 | 0.0 | 16505.636346 | 16505.64 | 13675.0 | 2830.64 | 0.0 | 0.0 | 0.0 | 2017-11 | 8208.82 | <NA> | 2018-10 | 689 | 685 | 0 | 43 | 1.0 | Individual | <NA> | <NA> | <NA> | 0 | 0.0 | 25090.0 | 1 | 3 | 1 | 3 | 3 | 21582.0 | 81.0 | 2 | 3 | 1277.0 | 62.0 | 14000.0 | 1 | 0 | 2 | 6 | 3136.0 | 2492.0 | 42.0 | 0 | 0.0 | 104 | 97 | 7 | 3 | 0 | 7 | <NA> | 7 | <NA> | 1 | 2 | 3 | 3 | 3 | 17 | 5 | 5 | 3 | 8 | 0 | 0 | 0 | 3 | 86.4 | 33.3 | 0 | 0 | 40567.0 | 25090.0 | 4300.0 | 26567.0 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | False | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | Cash | False | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | 6% - 6.99% |
| 1756456 | 9836985 | 14675.0 | 14675.0 | 14675.0 | 36 | 6.0 | 455.52 | C | C3 | Maintenance Tech | 1 year | RENT | 55000.0 | Not Verified | 2013-12 | Fully Paid | False | https://lendingclub.com/browse/loanDetail.acti... | <NA> | credit_card | Consolidation | 444xx | OH | 12.55 | 2 | 2008-08 | 660 | 664 | 0 | 9 | <NA> | 9 | 0 | 6852.0 | 57.1 | 13 | f | 0.0 | 0.0 | 16942.459999 | 16942.46 | 14675.0 | 2242.03 | 25.43 | 0.0 | 0.0 | 2015-09 | 7223.39 | <NA> | 2017-07 | 629 | 625 | 0 | <NA> | 1.0 | Individual | <NA> | <NA> | <NA> | 0 | 0.0 | 25886.0 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | 12000.0 | <NA> | <NA> | <NA> | 2 | 2876.0 | 4056.0 | 61.7 | 0 | 0.0 | 51 | 64 | 3 | 3 | 0 | 3 | 9 | 15 | 9 | 0 | 4 | 6 | 4 | 5 | 6 | 6 | 7 | 6 | 9 | 0 | 0 | 0 | 1 | 83.3 | 25.0 | 0 | 0 | 33065.0 | 25886.0 | 10600.0 | 21065.0 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | False | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | Cash | False | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | 6% - 6.99% |
| 1301075 | 13408227 | 15600.0 | 15600.0 | 15600.0 | 60 | 6.0 | 317.38 | C | C3 | POLICE OFFICER | 8 years | MORTGAGE | 71000.0 | Source Verified | 2014-04 | Fully Paid | False | https://lendingclub.com/browse/loanDetail.acti... | <NA> | debt_consolidation | Debt consolidation | 337xx | FL | 26.08 | 0 | 1999-11 | 695 | 699 | 0 | <NA> | <NA> | 18 | 0 | 16189.0 | 40.1 | 47 | w | 0.0 | 0.0 | 20398.094502 | 20398.09 | 15600.0 | 4798.09 | 0.0 | 0.0 | 0.0 | 2017-07 | 6606.72 | <NA> | 2017-12 | 704 | 700 | 0 | <NA> | 1.0 | Individual | <NA> | <NA> | <NA> | 0 | 0.0 | 241875.0 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | 40400.0 | <NA> | <NA> | <NA> | 10 | 13438.0 | 5508.0 | 53.7 | 0 | 0.0 | 172 | 113 | 10 | 8 | 4 | 15 | <NA> | 12 | <NA> | 0 | 6 | 8 | 8 | 14 | 20 | 13 | 23 | 8 | 18 | 0 | 0 | 0 | 4 | 100.0 | 37.5 | 0 | 0 | 280224.0 | 67895.0 | 11900.0 | 59924.0 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | False | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | Cash | False | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | 6% - 6.99% |
| 89739 | 63326688 | 25000.0 | 25000.0 | 24950.0 | 60 | 6.0 | 507.9 | D | D2 | Attorney | 3 years | MORTGAGE | 73600.0 | Source Verified | 2015-10 | Current | False | https://lendingclub.com/browse/loanDetail.acti... | <NA> | debt_consolidation | Debt consolidation | 324xx | FL | 22.68 | 0 | 2000-10 | 715 | 719 | 1 | <NA> | <NA> | 22 | 0 | 12795.0 | 95.5 | 41 | w | 9644.51 | 9625.22 | 23116.93 | 23070.7 | 15355.49 | 7761.44 | 0.0 | 0.0 | 0.0 | 2019-03 | 507.9 | 2019-04 | 2019-03 | 684 | 680 | 0 | <NA> | 1.0 | Individual | <NA> | <NA> | <NA> | 0 | 0.0 | 293111.0 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | 13400.0 | <NA> | <NA> | <NA> | 8 | 13323.0 | 1.0 | 99.9 | 0 | 0.0 | 180 | 136 | 5 | 3 | 2 | 96 | <NA> | 5 | <NA> | 0 | 1 | 3 | 1 | 1 | 34 | 3 | 5 | 3 | 22 | 0 | 0 | 0 | 3 | 100.0 | 100.0 | 0 | 0 | 351325.0 | 137837.0 | 1000.0 | 166313.0 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | False | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | Cash | False | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | 6% - 6.99% |
| 1012395 | 75299976 | 30000.0 | 30000.0 | 30000.0 | 60 | 6.0 | 686.58 | F | F3 | Management Assistant | 10+ years | MORTGAGE | 79098.0 | Source Verified | 2016-03 | Current | False | https://lendingclub.com/browse/loanDetail.acti... | <NA> | debt_consolidation | Debt consolidation | 221xx | VA | 17.92 | 0 | 2002-01 | 695 | 699 | 0 | 64 | <NA> | 11 | 0 | 3385.0 | 30.0 | 23 | w | 16118.06 | 16118.06 | 29411.19 | 29411.19 | 13881.94 | 15294.51 | 234.74 | 0.0 | 0.0 | 2019-02 | 686.58 | 2019-04 | 2019-03 | 644 | 640 | 0 | 64 | 1.0 | Individual | <NA> | <NA> | <NA> | 0 | 0.0 | 367052.0 | 2 | 5 | 2 | 4 | 5 | 84304.0 | 124.0 | 1 | 2 | 2005.0 | 110.0 | 11300.0 | 1 | 1 | 2 | 7 | 33368.0 | 4915.0 | 40.8 | 0 | 0.0 | 104 | 170 | 11 | 5 | 2 | 11 | 81 | 5 | 81 | 2 | 3 | 3 | 3 | 5 | 11 | 4 | 9 | 3 | 11 | 0 | 0 | 0 | 4 | 90.9 | 0.0 | 0 | 0 | 363394.0 | 87689.0 | 8300.0 | 68047.0 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | False | INTEREST ONLY-3 MONTHS DEFERRAL | FAMILY_DEATH | BROKEN | 3 | 532.15 | 2017-08 | 2017-09 | 2017-08 | 3 | 26 | Late (16-30 days) | <NA> | 26593.67 | 26.13 | Cash | False | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | 6% - 6.99% |
| 1290147 | 14530462 | 26000.0 | 26000.0 | 26000.0 | 60 | 6.0 | 511.92 | D | D1 | Budget Analyst | 10+ years | MORTGAGE | 75000.0 | Verified | 2014-04 | Fully Paid | False | https://lendingclub.com/browse/loanDetail.acti... | <NA> | debt_consolidation | Debt consolidation | 226xx | VA | 24.94 | 1 | 2000-10 | 705 | 709 | 0 | 13 | <NA> | 16 | 0 | 19651.0 | 52.6 | 44 | w | 0.0 | 0.0 | 32326.93 | 32326.93 | 26000.0 | 6326.93 | 0.0 | 0.0 | 0.0 | 2016-03 | 17799.6 | <NA> | 2019-03 | 714 | 710 | 0 | <NA> | 1.0 | Individual | <NA> | <NA> | <NA> | 0 | 0.0 | 70119.0 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | 33800.0 | <NA> | <NA> | <NA> | 7 | 5843.0 | 7349.0 | 68.8 | 0 | 0.0 | 162 | 162 | 10 | 3 | 4 | 10 | 13 | 12 | 13 | 0 | 3 | 3 | 5 | 18 | 8 | 12 | 32 | 3 | 16 | 0 | 0 | 0 | 3 | 95.2 | 50.0 | 0 | 0 | 99474.0 | 70119.0 | 27000.0 | 65674.0 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | False | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | Cash | False | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | 6% - 6.99% |
| 594599 | 116207885 | 20000.0 | 20000.0 | 20000.0 | 60 | 6.0 | 321.16 | E | E4 | Leasing Credit Officer | 10+ years | RENT | 70000.0 | Source Verified | 2017-08 | Current | False | https://lendingclub.com/browse/loanDetail.acti... | <NA> | medical | Medical expenses | 606xx | IL | 11.33 | 0 | 2006-07 | 695 | 699 | 1 | 26 | <NA> | 15 | 0 | 10065.0 | 33.4 | 31 | w | 11878.81 | 11878.81 | 14049.25 | 14049.25 | 8121.19 | 5928.06 | 0.0 | 0.0 | 0.0 | 2019-03 | 321.16 | 2019-04 | 2019-03 | 659 | 655 | 0 | <NA> | 1.0 | Individual | <NA> | <NA> | <NA> | 0 | 0.0 | 308544.0 | 2 | 5 | 0 | 2 | 17 | 42244.0 | 96.0 | 0 | 0 | 3084.0 | 71.0 | 30100.0 | 4 | 2 | 3 | 4 | 20570.0 | 13601.0 | 18.6 | 0 | 0.0 | 133 | 132 | 35 | 2 | 3 | 61 | <NA> | 0 | <NA> | 0 | 2 | 6 | 3 | 4 | 19 | 8 | 9 | 6 | 15 | 0 | 0 | 0 | 2 | 93.1 | 0.0 | 0 | 0 | 328931.0 | 52309.0 | 16700.0 | 43920.0 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | False | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | Cash | False | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | 6% - 6.99% |
| 2046566 | 126149005 | 10000.0 | 10000.0 | 10000.0 | 36 | 6.0 | 292.38 | D | D1 | Lead | 9 years | RENT | 60000.0 | Source Verified | 2017-12 | Current | False | https://lendingclub.com/browse/loanDetail.acti... | <NA> | debt_consolidation | Debt consolidation | 104xx | NY | 26.86 | 0 | 2010-11 | 745 | 749 | 2 | <NA> | <NA> | 9 | 0 | 3378.0 | 15.7 | 11 | w | 5814.79 | 5814.79 | 6026.72 | 6026.72 | 4185.21 | 1787.96 | 53.55 | 0.0 | 0.0 | 2019-03 | 999.19 | 2019-04 | 2019-03 | 609 | 605 | 0 | <NA> | 1.0 | Individual | <NA> | <NA> | <NA> | 0 | 0.0 | 66278.0 | 3 | 2 | 1 | 1 | 3 | 62900.0 | 98.0 | 2 | 3 | 1416.0 | 53.0 | 21500.0 | 0 | 2 | 5 | 4 | 9468.0 | 5093.0 | 24.0 | 0 | 0.0 | 53 | 85 | 1 | 1 | 0 | 1 | <NA> | 1 | <NA> | 0 | 4 | 5 | 6 | 7 | 3 | 7 | 8 | 5 | 9 | 0 | 0 | 0 | 3 | 100.0 | 25.0 | 0 | 0 | 104891.0 | 66278.0 | 6700.0 | 83391.0 | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | False | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | Cash | False | <NA> | <NA> | <NA> | <NA> | <NA> | <NA> | 6% - 6.99% |
I don't see any interesting patterns in this sample, so I will filter out the 626 rows that have anomalously low interest rates.
int_rate_is_anomalous = (filtered_loan_data["grade"] != "A") & (
filtered_loan_data["int_rate"] < 7
)
int_rate_is_anomalous = int_rate_is_anomalous | (
(filtered_loan_data["grade"] == "D") & (filtered_loan_data["int_rate"] < 9)
)
filtered_loan_data = filtered_loan_data[~int_rate_is_anomalous]
hardship_type, hardship_length, and deferral_term¶
The features hardship_type, hardship_length, and deferral_term each have only one value (other
than <NA>).
hardship_type_counts = get_value_counts(filtered_loan_data["hardship_type"])
display(style_value_counts(hardship_type_counts))
| count | |
|---|---|
| hardship_type | |
| <NA> | 2,206,594 |
| INTEREST ONLY-3 MONTHS DEFERRAL | 10,913 |
hardship_length_counts = get_value_counts(filtered_loan_data["hardship_length"])
display(style_value_counts(hardship_length_counts))
| count | |
|---|---|
| hardship_length | |
| <NA> | 2,206,594 |
| 3 | 10,913 |
deferral_term_counts = get_value_counts(filtered_loan_data["deferral_term"])
display(style_value_counts(deferral_term_counts))
| count | |
|---|---|
| deferral_term | |
| <NA> | 2,206,594 |
| 3 | 10,913 |
It appears that loans with <NA> in these columns are not associated with a hardship plan, so each of these columns can be used as a flag for hardship plans.
If these columns are dropped, can we infer from other columns which loans are associated with a hardship plan?
Yes, because a loan has hardship_amount equal to <NA> if and only if hardship_type, hardship_length, and deferral_term are <NA>.
hardship_amount_isna = filtered_loan_data["hardship_amount"].isna()
matching_na = (
hardship_amount_isna.equals(filtered_loan_data["hardship_type"].isna())
and hardship_amount_isna.equals(filtered_loan_data["hardship_length"].isna())
and hardship_amount_isna.equals(filtered_loan_data["deferral_term"].isna())
)
if matching_na:
print(
"The columns hardship_amount, hardship_type, hardship_length, "
"and deferral term have <NA> values at the same positions."
)
else:
print("Mismatch in <NA> values.")
The columns hardship_amount, hardship_type, hardship_length, and deferral term have <NA> values at the same positions.
Filter data¶
Taking account of the feature summaries and the feature exploration above, certain columns will be excluded from the analysis of accepted loans.
- url: URL for the LC page with listing data
- title: The loan title provided by the borrower
- desc: Loan description provided by the borrower
- policy_code: publicly available policy_code=1, new products not publicly available policy_code=2
- hardship_type: Describes the hardship plan offering
- hardship_length: The number of months the borrower will make smaller payments than normally obligated due to a hardship plan
- deferral_term: Amount of months that the borrower is expected to pay less than the contractual monthly payment amount due to a hardship plan
Also, rows will be featured out based on the following criteria:
- Problematic values for
loan_status<NA>Does not meet the credit policy. Status:Fully PaidDoes not meet the credit policy. Status:Charged Off
- Values of
issue_dbefore 2012 - Anomalously low values of
int_rate
to_drop = [
"url",
"title",
"desc",
"policy_code",
"hardship_type",
"hardship_length",
"deferral_term",
]
filtered_loan_data = filtered_loan_data.drop(
to_drop,
axis="columns",
)